Many challenges from natural world can be formulated as a graph matching problem. Previous deep learning-based methods mainly consider a full two-graph matching setting. In this work, we study the more general partial matching problem with multi-graph cycle consistency guarantees. Building on a recent progress in deep learning on graphs, we propose a novel data-driven method (URL) for partial multi-graph matching, which uses an object-to-universe formulation and learns latent representations of abstract universe points. The proposed approach advances the state of the art in semantic keypoint matching problem, evaluated on Pascal VOC, CUB, and Willow datasets. Moreover, the set of controlled experiments on a synthetic graph matching dataset demonstrates the scalability of our method to graphs with large number of nodes and its robustness to high partiality.
translated by 谷歌翻译
3D形状匹配是计算机视觉和计算机图形方面的长期问题。虽然深度神经网络被证明会导致最先进的形状匹配结果,但在多形匹配的背景下,现有基于学习的方法受到限制:(i)他们只专注于匹配的形状和形状和因此,遭受了循环矛盾的多匹配,或者(ii)它们需要明确的模板形状来解决形状集合的匹配。在本文中,我们提出了一种用于深度多形匹配的新颖方法,可确保周期一致的多匹配,而不是依赖于明确的模板形状。为此,我们利用了形状到宇宙的多匹配表示形式,我们将其与强大的功能映射正则化相结合,以便可以完全不受监督的方式对我们的多形匹配的神经网络进行训练。虽然仅在训练时间内考虑了功能图正则化,但并未计算出用于预测对应关系的功能图,从而允许快速推断。我们证明,我们的方法在几个具有挑战性的基准数据集上实现了最新的结果,并且最引人注目的是,我们的无监督方法甚至超过了最近的监督方法。
translated by 谷歌翻译
图形匹配优化问题是计算机视觉中许多任务的重要组成部分,例如在通信中带来两个可变形对象。自然,在过去的几十年中,已经提出了广泛的适用算法。由于尚未开发出通用的标准基准,因此由于对不同的问题实例的评估和标准使结果无与伦比,因此通常很难验证其绩效主张。为了解决这些缺点,我们提出了匹配算法的比较研究。我们创建了一个统一的基准测试标准,在其中收集和分类了一组现有和公开可用的计算机视觉图形匹配问题,以通用格式。同时,我们收集和分类图形匹配算法的最流行的开源实现。它们的性能以与比较优化算法的最佳实践相符的方式进行评估。该研究旨在可再现和扩展,以作为未来的宝贵资源。我们的研究提供了三个值得注意的见解:1。)流行问题实例在少于1秒的时间内完全可以解决,因此不足以进行将来的经​​验评估; 2.)最受欢迎的基线方法高于最佳可用方法; 3.)尽管该问题存在NP硬度,但即使对于具有超过500个顶点的图形,也可以在几秒钟内求解来自视力应用程序的实例。
translated by 谷歌翻译
我们解决了最小化一类能量功能的问题,该功能由数据和平滑度术语组成,这些术语通常发生在机器学习,计算机视觉和模式识别中。尽管离散优化方法能够提供理论最优保证,但它们只能处理有限数量的标签,因此会遭受标签离散偏置的困扰。现有的连续优化方法可以找到Sublabel精确的解决方案,但对于大型标签空间而言,它们并不有效。在这项工作中,我们提出了一种有效的Sublabel精确方法,该方法利用了连续模型和离散模型的最佳属性。我们将问题分为两个顺序的步骤:(i)选择标签范围的全局离散优化,以及(ii)在所选范围内的能量函数凸的有效连续连续的sublabel-carcurate局部改进。这样做可以使我们能够提高时间和记忆效率,同时实际上将准确性保持在与连续凸放放松方法相同的水平上,此外,在离散方法级别上提供了理论最佳保证。最后,我们显示了提出的对一般成对平滑度项的拟议方法的灵活性,因此它适用于广泛的正则化。图像授予问题的说明示例的实验证明了该方法的特性。代码复制实验可在\ url {https://github.com/nurlanov-zh/sublabel-accurate-alpha-expansion}获得。
translated by 谷歌翻译
Neural 3D implicit representations learn priors that are useful for diverse applications, such as single- or multiple-view 3D reconstruction. A major downside of existing approaches while rendering an image is that they require evaluating the network multiple times per camera ray so that the high computational time forms a bottleneck for downstream applications. We address this problem by introducing a novel neural scene representation that we call the directional distance function (DDF). To this end, we learn a signed distance function (SDF) along with our DDF model to represent a class of shapes. Specifically, our DDF is defined on the unit sphere and predicts the distance to the surface along any given direction. Therefore, our DDF allows rendering images with just a single network evaluation per camera ray. Based on our DDF, we present a novel fast algorithm (FIRe) to reconstruct 3D shapes given a posed depth map. We evaluate our proposed method on 3D reconstruction from single-view depth images, where we empirically show that our algorithm reconstructs 3D shapes more accurately and it is more than 15 times faster (per iteration) than competing methods.
translated by 谷歌翻译
在图中找到最短路径与计算机视觉和图形中的许多问题相关,包括图像分割,形状匹配或离散表面上的测地距的计算。传统上,使用标量边缘权重的图表考虑了最短路径的概念,这使得可以通过添加各个边缘权重来计算路径的长度。然而,具有标量边缘权重的图对它们的表现率严重限制,因为通常使用边缘来编码更复杂的相互关系。在这项工作中,我们弥补了这种建模限制,并介绍了矩阵值边缘的图表中最短路径的新图形 - 理论概念。为此,我们定义了一种有意义的方式,用于量化矩阵值的边缘的路径长度,并且我们提出了一种简单但有效的算法来计算各个最短路径。虽然我们的形式主义是普遍的,因此适用于视野,图形及更远的各种环境,我们专注于在3D多种形式分析的背景下展示其优点。
translated by 谷歌翻译
Deep neural networks (DNNs) are vulnerable to a class of attacks called "backdoor attacks", which create an association between a backdoor trigger and a target label the attacker is interested in exploiting. A backdoored DNN performs well on clean test images, yet persistently predicts an attacker-defined label for any sample in the presence of the backdoor trigger. Although backdoor attacks have been extensively studied in the image domain, there are very few works that explore such attacks in the video domain, and they tend to conclude that image backdoor attacks are less effective in the video domain. In this work, we revisit the traditional backdoor threat model and incorporate additional video-related aspects to that model. We show that poisoned-label image backdoor attacks could be extended temporally in two ways, statically and dynamically, leading to highly effective attacks in the video domain. In addition, we explore natural video backdoors to highlight the seriousness of this vulnerability in the video domain. And, for the first time, we study multi-modal (audiovisual) backdoor attacks against video action recognition models, where we show that attacking a single modality is enough for achieving a high attack success rate.
translated by 谷歌翻译
Machine learning models are typically evaluated by computing similarity with reference annotations and trained by maximizing similarity with such. Especially in the bio-medical domain, annotations are subjective and suffer from low inter- and intra-rater reliability. Since annotations only reflect the annotation entity's interpretation of the real world, this can lead to sub-optimal predictions even though the model achieves high similarity scores. Here, the theoretical concept of Peak Ground Truth (PGT) is introduced. PGT marks the point beyond which an increase in similarity with the reference annotation stops translating to better Real World Model Performance (RWMP). Additionally, a quantitative technique to approximate PGT by computing inter- and intra-rater reliability is proposed. Finally, three categories of PGT-aware strategies to evaluate and improve model performance are reviewed.
translated by 谷歌翻译
Mixtures of von Mises-Fisher distributions can be used to cluster data on the unit hypersphere. This is particularly adapted for high-dimensional directional data such as texts. We propose in this article to estimate a von Mises mixture using a l 1 penalized likelihood. This leads to sparse prototypes that improve clustering interpretability. We introduce an expectation-maximisation (EM) algorithm for this estimation and explore the trade-off between the sparsity term and the likelihood one with a path following algorithm. The model's behaviour is studied on simulated data and, we show the advantages of the approach on real data benchmark. We also introduce a new data set on financial reports and exhibit the benefits of our method for exploratory analysis.
translated by 谷歌翻译
Passive monitoring of acoustic or radio sources has important applications in modern convenience, public safety, and surveillance. A key task in passive monitoring is multiobject tracking (MOT). This paper presents a Bayesian method for multisensor MOT for challenging tracking problems where the object states are high-dimensional, and the measurements follow a nonlinear model. Our method is developed in the framework of factor graphs and the sum-product algorithm (SPA). The multimodal probability density functions (pdfs) provided by the SPA are effectively represented by a Gaussian mixture model (GMM). To perform the operations of the SPA in high-dimensional spaces, we make use of Particle flow (PFL). Here, particles are migrated towards regions of high likelihood based on the solution of a partial differential equation. This makes it possible to obtain good object detection and tracking performance even in challenging multisensor MOT scenarios with single sensor measurements that have a lower dimension than the object positions. We perform a numerical evaluation in a passive acoustic monitoring scenario where multiple sources are tracked in 3-D from 1-D time-difference-of-arrival (TDOA) measurements provided by pairs of hydrophones. Our numerical results demonstrate favorable detection and estimation accuracy compared to state-of-the-art reference techniques.
translated by 谷歌翻译